Overview

Dataset statistics

Number of variables22
Number of observations101
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory63.6 KiB
Average record size in memory644.4 B

Variable types

Text1
Numeric9
Categorical9
DateTime2
Boolean1

Dataset

DescriptionJHB_SCHARP_004 - Quality-corrected harmonized data
CreatorRP2 Clinical Data Quality Team
AuthorQuality-Checked Data
URLHEAT Research Projects

Variable descriptions

Age (at enrolment)Patient age at study enrollment
CD4 cell count (cells/µL)CD4+ T lymphocyte count (missing codes removed)
HIV viral load (copies/mL)HIV RNA copies per mL (missing codes removed)
BMI (kg/m²)Body Mass Index (extreme values removed)
Waist circumference (cm)Waist circumference (corrected from mm to cm)
ALT (U/L)Alanine aminotransferase (missing codes removed)
Platelet count (×10³/µL)Platelet count (missing codes removed)
Hematocrit (%)Hematocrit (zero values removed)
Lymphocyte count (×10⁹/L)Lymphocyte absolute count (corrected labeling)
Neutrophil count (×10⁹/L)Neutrophil absolute count (corrected labeling)
cd4_correction_appliedQuality flag: CD4 missing codes removed
final_comprehensive_fix_appliedQuality flag: Comprehensive corrections applied
waist_circ_unit_correction_appliedQuality flag: Waist circ unit corrected

Alerts

study_source has constant value "JHB_SCHARP_004"Constant
Sex has constant value "Male"Constant
latitude has constant value "-26.2041"Constant
longitude has constant value "28.03"Constant
province has constant value "Gauteng"Constant
city has constant value "Johannesburg"Constant
jhb_subregion has constant value "Soweto"Constant
cd4_correction_applied has constant value "0.0"Constant
final_comprehensive_fix_applied has constant value "1.0"Constant
waist_circ_unit_correction_applied has constant value "False"Constant
ALT (U/L) is highly overall correlated with AST (U/L)High correlation
AST (U/L) is highly overall correlated with ALT (U/L)High correlation
Hematocrit (%) is highly overall correlated with hemoglobin_g_dLHigh correlation
hemoglobin_g_dL is highly overall correlated with Hematocrit (%)High correlation
anonymous_patient_id has unique valuesUnique
Patient ID has unique valuesUnique

Reproduction

Analysis started2025-11-24 21:49:42.221460
Analysis finished2025-11-24 21:49:46.692082
Duration4.47 seconds
Software versionydata-profiling vv4.18.0
Download configurationconfig.json

Variables

Distinct101
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
2025-11-24T23:49:46.731088image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length15
Median length14
Mean length14.188119
Min length12

Characters and Unicode

Total characters1433
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique101 ?
Unique (%)100.0%

Sample

1st rowSCHARP_004_6
2nd rowSCHARP_004_10
3rd rowSCHARP_004_24
4th rowSCHARP_004_25
5th rowSCHARP_004_34
ValueCountFrequency (%)
scharp_004_61
 
1.0%
scharp_004_5131
 
1.0%
scharp_004_241
 
1.0%
scharp_004_251
 
1.0%
scharp_004_341
 
1.0%
scharp_004_371
 
1.0%
scharp_004_391
 
1.0%
scharp_004_571
 
1.0%
scharp_004_691
 
1.0%
scharp_004_711
 
1.0%
Other values (91)91
90.1%
2025-11-24T23:49:46.839115image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0216
15.1%
_202
14.1%
4130
9.1%
P101
7.0%
C101
7.0%
S101
7.0%
R101
7.0%
A101
7.0%
H101
7.0%
167
 
4.7%
Other values (7)212
14.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number625
43.6%
Uppercase Letter606
42.3%
Connector Punctuation202
 
14.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0216
34.6%
4130
20.8%
167
 
10.7%
541
 
6.6%
239
 
6.2%
633
 
5.3%
329
 
4.6%
725
 
4.0%
924
 
3.8%
821
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
P101
16.7%
C101
16.7%
S101
16.7%
R101
16.7%
A101
16.7%
H101
16.7%
Connector Punctuation
ValueCountFrequency (%)
_202
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common827
57.7%
Latin606
42.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0216
26.1%
_202
24.4%
4130
15.7%
167
 
8.1%
541
 
5.0%
239
 
4.7%
633
 
4.0%
329
 
3.5%
725
 
3.0%
924
 
2.9%
Latin
ValueCountFrequency (%)
P101
16.7%
C101
16.7%
S101
16.7%
R101
16.7%
A101
16.7%
H101
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1433
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0216
15.1%
_202
14.1%
4130
9.1%
P101
7.0%
C101
7.0%
S101
7.0%
R101
7.0%
A101
7.0%
H101
7.0%
167
 
4.7%
Other values (7)212
14.8%

Patient ID
Real number (ℝ)

Unique 

Distinct101
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean719.87129
Minimum6
Maximum1992
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2025-11-24T23:49:46.891363image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile37
Q1235
median533
Q31140
95-th percentile1884
Maximum1992
Range1986
Interquartile range (IQR)905

Descriptive statistics

Standard deviation598.36806
Coefficient of variation (CV)0.83121534
Kurtosis-0.6813023
Mean719.87129
Median Absolute Deviation (MAD)342
Skewness0.77670919
Sum72707
Variance358044.33
MonotonicityStrictly increasing
2025-11-24T23:49:46.940697image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
61
 
1.0%
7581
 
1.0%
11071
 
1.0%
10811
 
1.0%
10591
 
1.0%
10511
 
1.0%
9541
 
1.0%
8721
 
1.0%
8131
 
1.0%
8111
 
1.0%
Other values (91)91
90.1%
ValueCountFrequency (%)
61
1.0%
101
1.0%
241
1.0%
251
1.0%
341
1.0%
371
1.0%
391
1.0%
571
1.0%
691
1.0%
711
1.0%
ValueCountFrequency (%)
19921
1.0%
19521
1.0%
19411
1.0%
19231
1.0%
19171
1.0%
18841
1.0%
18731
1.0%
18501
1.0%
17511
1.0%
17501
1.0%

study_source
Categorical

Constant 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
JHB_SCHARP_004
101 

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters1414
Distinct characters11
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJHB_SCHARP_004
2nd rowJHB_SCHARP_004
3rd rowJHB_SCHARP_004
4th rowJHB_SCHARP_004
5th rowJHB_SCHARP_004

Common Values

ValueCountFrequency (%)
JHB_SCHARP_004101
100.0%

Length

2025-11-24T23:49:46.988703image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:47.020451image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
jhb_scharp_004101
100.0%

Most occurring characters

ValueCountFrequency (%)
H202
14.3%
_202
14.3%
0202
14.3%
J101
7.1%
B101
7.1%
S101
7.1%
C101
7.1%
A101
7.1%
R101
7.1%
P101
7.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter909
64.3%
Decimal Number303
 
21.4%
Connector Punctuation202
 
14.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
H202
22.2%
J101
11.1%
B101
11.1%
S101
11.1%
C101
11.1%
A101
11.1%
R101
11.1%
P101
11.1%
Decimal Number
ValueCountFrequency (%)
0202
66.7%
4101
33.3%
Connector Punctuation
ValueCountFrequency (%)
_202
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin909
64.3%
Common505
35.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
H202
22.2%
J101
11.1%
B101
11.1%
S101
11.1%
C101
11.1%
A101
11.1%
R101
11.1%
P101
11.1%
Common
ValueCountFrequency (%)
_202
40.0%
0202
40.0%
4101
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1414
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
H202
14.3%
_202
14.3%
0202
14.3%
J101
7.1%
B101
7.1%
S101
7.1%
C101
7.1%
A101
7.1%
R101
7.1%
P101
7.1%
Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
Minimum2015-01-01 00:00:00
Maximum2016-01-01 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-11-24T23:49:47.046012image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:47.081301image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=2)
Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
Minimum2015-01-01 00:00:00
Maximum2016-01-01 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-11-24T23:49:47.114434image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:47.150540image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=2)

Age (at enrolment)
Real number (ℝ)

Patient age at study enrollment

Distinct19
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.19802
Minimum18
Maximum42
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2025-11-24T23:49:47.187967image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile18
Q120
median22
Q326
95-th percentile32
Maximum42
Range24
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.6883255
Coefficient of variation (CV)0.20210025
Kurtosis3.2769758
Mean23.19802
Median Absolute Deviation (MAD)3
Skewness1.5880952
Sum2343
Variance21.980396
MonotonicityNot monotonic
2025-11-24T23:49:47.225442image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
2113
12.9%
2012
11.9%
1811
10.9%
2610
9.9%
1910
9.9%
229
8.9%
248
7.9%
256
5.9%
236
5.9%
284
 
4.0%
Other values (9)12
11.9%
ValueCountFrequency (%)
1811
10.9%
1910
9.9%
2012
11.9%
2113
12.9%
229
8.9%
236
5.9%
248
7.9%
256
5.9%
2610
9.9%
272
 
2.0%
ValueCountFrequency (%)
421
 
1.0%
391
 
1.0%
381
 
1.0%
341
 
1.0%
331
 
1.0%
321
 
1.0%
312
2.0%
292
2.0%
284
4.0%
272
2.0%

Sex
Categorical

Constant 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size6.0 KiB
Male
101 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters404
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowMale
4th rowMale
5th rowMale

Common Values

ValueCountFrequency (%)
Male101
100.0%

Length

2025-11-24T23:49:47.266849image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:47.300651image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
male101
100.0%

Most occurring characters

ValueCountFrequency (%)
M101
25.0%
a101
25.0%
l101
25.0%
e101
25.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter303
75.0%
Uppercase Letter101
 
25.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a101
33.3%
l101
33.3%
e101
33.3%
Uppercase Letter
ValueCountFrequency (%)
M101
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin404
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M101
25.0%
a101
25.0%
l101
25.0%
e101
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII404
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M101
25.0%
a101
25.0%
l101
25.0%
e101
25.0%

latitude
Categorical

Constant 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
-26.2041
101 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters808
Distinct characters7
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-26.2041
2nd row-26.2041
3rd row-26.2041
4th row-26.2041
5th row-26.2041

Common Values

ValueCountFrequency (%)
-26.2041101
100.0%

Length

2025-11-24T23:49:47.337569image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:47.371545image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
26.2041101
100.0%

Most occurring characters

ValueCountFrequency (%)
2202
25.0%
-101
12.5%
6101
12.5%
.101
12.5%
0101
12.5%
4101
12.5%
1101
12.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number606
75.0%
Dash Punctuation101
 
12.5%
Other Punctuation101
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2202
33.3%
6101
16.7%
0101
16.7%
4101
16.7%
1101
16.7%
Dash Punctuation
ValueCountFrequency (%)
-101
100.0%
Other Punctuation
ValueCountFrequency (%)
.101
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common808
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2202
25.0%
-101
12.5%
6101
12.5%
.101
12.5%
0101
12.5%
4101
12.5%
1101
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII808
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2202
25.0%
-101
12.5%
6101
12.5%
.101
12.5%
0101
12.5%
4101
12.5%
1101
12.5%

longitude
Categorical

Constant 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
28.03
101 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters505
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row28.03
2nd row28.03
3rd row28.03
4th row28.03
5th row28.03

Common Values

ValueCountFrequency (%)
28.03101
100.0%

Length

2025-11-24T23:49:47.403722image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:47.433793image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
28.03101
100.0%

Most occurring characters

ValueCountFrequency (%)
2101
20.0%
8101
20.0%
.101
20.0%
0101
20.0%
3101
20.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number404
80.0%
Other Punctuation101
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2101
25.0%
8101
25.0%
0101
25.0%
3101
25.0%
Other Punctuation
ValueCountFrequency (%)
.101
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common505
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2101
20.0%
8101
20.0%
.101
20.0%
0101
20.0%
3101
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII505
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2101
20.0%
8101
20.0%
.101
20.0%
0101
20.0%
3101
20.0%

province
Categorical

Constant 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size6.3 KiB
Gauteng
101 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters707
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGauteng
2nd rowGauteng
3rd rowGauteng
4th rowGauteng
5th rowGauteng

Common Values

ValueCountFrequency (%)
Gauteng101
100.0%

Length

2025-11-24T23:49:47.468237image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:47.500464image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
gauteng101
100.0%

Most occurring characters

ValueCountFrequency (%)
G101
14.3%
a101
14.3%
u101
14.3%
t101
14.3%
e101
14.3%
n101
14.3%
g101
14.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter606
85.7%
Uppercase Letter101
 
14.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a101
16.7%
u101
16.7%
t101
16.7%
e101
16.7%
n101
16.7%
g101
16.7%
Uppercase Letter
ValueCountFrequency (%)
G101
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin707
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
G101
14.3%
a101
14.3%
u101
14.3%
t101
14.3%
e101
14.3%
n101
14.3%
g101
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII707
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G101
14.3%
a101
14.3%
u101
14.3%
t101
14.3%
e101
14.3%
n101
14.3%
g101
14.3%

city
Categorical

Constant 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
Johannesburg
101 

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters1212
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJohannesburg
2nd rowJohannesburg
3rd rowJohannesburg
4th rowJohannesburg
5th rowJohannesburg

Common Values

ValueCountFrequency (%)
Johannesburg101
100.0%

Length

2025-11-24T23:49:47.532087image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:47.561091image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
johannesburg101
100.0%

Most occurring characters

ValueCountFrequency (%)
n202
16.7%
J101
8.3%
o101
8.3%
h101
8.3%
a101
8.3%
e101
8.3%
s101
8.3%
b101
8.3%
u101
8.3%
r101
8.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1111
91.7%
Uppercase Letter101
 
8.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n202
18.2%
o101
9.1%
h101
9.1%
a101
9.1%
e101
9.1%
s101
9.1%
b101
9.1%
u101
9.1%
r101
9.1%
g101
9.1%
Uppercase Letter
ValueCountFrequency (%)
J101
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1212
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n202
16.7%
J101
8.3%
o101
8.3%
h101
8.3%
a101
8.3%
e101
8.3%
s101
8.3%
b101
8.3%
u101
8.3%
r101
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1212
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n202
16.7%
J101
8.3%
o101
8.3%
h101
8.3%
a101
8.3%
e101
8.3%
s101
8.3%
b101
8.3%
u101
8.3%
r101
8.3%

jhb_subregion
Categorical

Constant 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size6.2 KiB
Soweto
101 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters606
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSoweto
2nd rowSoweto
3rd rowSoweto
4th rowSoweto
5th rowSoweto

Common Values

ValueCountFrequency (%)
Soweto101
100.0%

Length

2025-11-24T23:49:47.596348image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:47.628925image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
soweto101
100.0%

Most occurring characters

ValueCountFrequency (%)
o202
33.3%
S101
16.7%
w101
16.7%
e101
16.7%
t101
16.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter505
83.3%
Uppercase Letter101
 
16.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o202
40.0%
w101
20.0%
e101
20.0%
t101
20.0%
Uppercase Letter
ValueCountFrequency (%)
S101
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin606
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o202
33.3%
S101
16.7%
w101
16.7%
e101
16.7%
t101
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII606
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o202
33.3%
S101
16.7%
w101
16.7%
e101
16.7%
t101
16.7%

Hematocrit (%)
Real number (ℝ)

High correlation 

Hematocrit (zero values removed)

Distinct70
Distinct (%)69.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.6
Minimum29.7
Maximum54.7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2025-11-24T23:49:47.666022image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum29.7
5-th percentile41.8
Q144.4
median46.6
Q348.6
95-th percentile51.3
Maximum54.7
Range25
Interquartile range (IQR)4.2

Descriptive statistics

Standard deviation3.4788216
Coefficient of variation (CV)0.074652825
Kurtosis4.4641552
Mean46.6
Median Absolute Deviation (MAD)2.2
Skewness-0.97576448
Sum4706.6
Variance12.1022
MonotonicityNot monotonic
2025-11-24T23:49:47.713227image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
484
 
4.0%
46.93
 
3.0%
46.23
 
3.0%
50.93
 
3.0%
43.63
 
3.0%
42.63
 
3.0%
45.53
 
3.0%
48.33
 
3.0%
50.83
 
3.0%
44.52
 
2.0%
Other values (60)71
70.3%
ValueCountFrequency (%)
29.71
 
1.0%
39.91
 
1.0%
40.41
 
1.0%
41.41
 
1.0%
41.71
 
1.0%
41.81
 
1.0%
42.31
 
1.0%
42.51
 
1.0%
42.63
3.0%
42.81
 
1.0%
ValueCountFrequency (%)
54.71
 
1.0%
53.71
 
1.0%
53.31
 
1.0%
521
 
1.0%
51.41
 
1.0%
51.31
 
1.0%
51.21
 
1.0%
50.93
3.0%
50.83
3.0%
50.41
 
1.0%

Platelet count (×10³/µL)
Real number (ℝ)

Platelet count (missing codes removed)

Distinct81
Distinct (%)80.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean278.15842
Minimum142
Maximum438
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2025-11-24T23:49:47.756661image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum142
5-th percentile201
Q1241
median270
Q3315
95-th percentile374
Maximum438
Range296
Interquartile range (IQR)74

Descriptive statistics

Standard deviation54.837347
Coefficient of variation (CV)0.1971443
Kurtosis0.36896072
Mean278.15842
Median Absolute Deviation (MAD)38
Skewness0.48248693
Sum28094
Variance3007.1347
MonotonicityNot monotonic
2025-11-24T23:49:47.802168image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2664
 
4.0%
2342
 
2.0%
2942
 
2.0%
3112
 
2.0%
2322
 
2.0%
2612
 
2.0%
2532
 
2.0%
2482
 
2.0%
2112
 
2.0%
2652
 
2.0%
Other values (71)79
78.2%
ValueCountFrequency (%)
1421
1.0%
1791
1.0%
1941
1.0%
1951
1.0%
1971
1.0%
2011
1.0%
2061
1.0%
2092
2.0%
2112
2.0%
2141
1.0%
ValueCountFrequency (%)
4381
1.0%
4361
1.0%
3971
1.0%
3831
1.0%
3771
1.0%
3741
1.0%
3731
1.0%
3701
1.0%
3571
1.0%
3471
1.0%

hemoglobin_g_dL
Real number (ℝ)

High correlation 

Distinct44
Distinct (%)43.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.659406
Minimum10.2
Maximum18.4
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2025-11-24T23:49:47.848294image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum10.2
5-th percentile13.8
Q114.9
median15.7
Q316.6
95-th percentile17.8
Maximum18.4
Range8.2
Interquartile range (IQR)1.7

Descriptive statistics

Standard deviation1.316676
Coefficient of variation (CV)0.084082116
Kurtosis2.114995
Mean15.659406
Median Absolute Deviation (MAD)0.9
Skewness-0.71216193
Sum1581.6
Variance1.7336356
MonotonicityNot monotonic
2025-11-24T23:49:47.891531image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
15.77
 
6.9%
15.86
 
5.9%
16.25
 
5.0%
16.54
 
4.0%
15.24
 
4.0%
16.64
 
4.0%
163
 
3.0%
14.43
 
3.0%
15.53
 
3.0%
14.23
 
3.0%
Other values (34)59
58.4%
ValueCountFrequency (%)
10.21
 
1.0%
12.31
 
1.0%
13.21
 
1.0%
13.31
 
1.0%
13.41
 
1.0%
13.81
 
1.0%
13.92
2.0%
141
 
1.0%
14.11
 
1.0%
14.23
3.0%
ValueCountFrequency (%)
18.41
 
1.0%
18.31
 
1.0%
18.11
 
1.0%
17.92
2.0%
17.81
 
1.0%
17.61
 
1.0%
17.42
2.0%
17.12
2.0%
173
3.0%
16.93
3.0%

Lymphocyte count (×10⁹/L)
Real number (ℝ)

Lymphocyte absolute count (corrected labeling)

Distinct78
Distinct (%)77.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0021782
Minimum0.77
Maximum4.31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2025-11-24T23:49:48.017797image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0.77
5-th percentile1.05
Q11.56
median1.95
Q32.36
95-th percentile3.09
Maximum4.31
Range3.54
Interquartile range (IQR)0.8

Descriptive statistics

Standard deviation0.63511984
Coefficient of variation (CV)0.31721444
Kurtosis0.89462595
Mean2.0021782
Median Absolute Deviation (MAD)0.4
Skewness0.6877214
Sum202.22
Variance0.40337721
MonotonicityNot monotonic
2025-11-24T23:49:48.063235image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.723
 
3.0%
2.413
 
3.0%
2.353
 
3.0%
1.563
 
3.0%
1.633
 
3.0%
1.542
 
2.0%
2.362
 
2.0%
1.672
 
2.0%
1.052
 
2.0%
1.882
 
2.0%
Other values (68)76
75.2%
ValueCountFrequency (%)
0.771
1.0%
0.961
1.0%
0.971
1.0%
11
1.0%
1.052
2.0%
1.121
1.0%
1.131
1.0%
1.161
1.0%
1.191
1.0%
1.21
1.0%
ValueCountFrequency (%)
4.311
1.0%
3.671
1.0%
3.331
1.0%
3.131
1.0%
3.11
1.0%
3.091
1.0%
3.051
1.0%
2.991
1.0%
2.951
1.0%
2.811
1.0%

Neutrophil count (×10⁹/L)
Real number (ℝ)

Neutrophil absolute count (corrected labeling)

Distinct91
Distinct (%)90.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.0844554
Minimum1.2
Maximum9.68
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2025-11-24T23:49:48.109124image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1.2
5-th percentile1.47
Q11.99
median2.67
Q33.77
95-th percentile6.22
Maximum9.68
Range8.48
Interquartile range (IQR)1.78

Descriptive statistics

Standard deviation1.5919375
Coefficient of variation (CV)0.51611622
Kurtosis3.2682385
Mean3.0844554
Median Absolute Deviation (MAD)0.85
Skewness1.6363613
Sum311.53
Variance2.534265
MonotonicityNot monotonic
2025-11-24T23:49:48.155886image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.383
 
3.0%
2.733
 
3.0%
1.683
 
3.0%
2.12
 
2.0%
1.992
 
2.0%
2.672
 
2.0%
2.192
 
2.0%
3.311
 
1.0%
2.211
 
1.0%
3.581
 
1.0%
Other values (81)81
80.2%
ValueCountFrequency (%)
1.21
1.0%
1.361
1.0%
1.371
1.0%
1.391
1.0%
1.461
1.0%
1.471
1.0%
1.481
1.0%
1.531
1.0%
1.551
1.0%
1.561
1.0%
ValueCountFrequency (%)
9.681
 
1.0%
8.351
 
1.0%
7.251
 
1.0%
7.051
 
1.0%
6.881
 
1.0%
6.221
 
1.0%
5.621
 
1.0%
5.383
3.0%
4.981
 
1.0%
4.911
 
1.0%

ALT (U/L)
Real number (ℝ)

High correlation 

Alanine aminotransferase (missing codes removed)

Distinct38
Distinct (%)37.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.693069
Minimum6
Maximum157
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2025-11-24T23:49:48.197455image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile9
Q111
median17
Q323
95-th percentile57
Maximum157
Range151
Interquartile range (IQR)12

Descriptive statistics

Standard deviation20.74596
Coefficient of variation (CV)0.91419806
Kurtosis18.64885
Mean22.693069
Median Absolute Deviation (MAD)6
Skewness3.7291486
Sum2292
Variance430.39485
MonotonicityNot monotonic
2025-11-24T23:49:48.242671image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=38)
ValueCountFrequency (%)
119
 
8.9%
109
 
8.9%
137
 
6.9%
186
 
5.9%
156
 
5.9%
196
 
5.9%
95
 
5.0%
165
 
5.0%
214
 
4.0%
124
 
4.0%
Other values (28)40
39.6%
ValueCountFrequency (%)
62
 
2.0%
81
 
1.0%
95
5.0%
109
8.9%
119
8.9%
124
4.0%
137
6.9%
142
 
2.0%
156
5.9%
165
5.0%
ValueCountFrequency (%)
1571
1.0%
931
1.0%
891
1.0%
631
1.0%
611
1.0%
571
1.0%
531
1.0%
521
1.0%
491
1.0%
471
1.0%

AST (U/L)
Real number (ℝ)

High correlation 

Distinct29
Distinct (%)28.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.217822
Minimum14
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2025-11-24T23:49:48.287057image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile15
Q119
median23
Q327
95-th percentile47
Maximum100
Range86
Interquartile range (IQR)8

Descriptive statistics

Standard deviation11.34866
Coefficient of variation (CV)0.45002538
Kurtosis18.547589
Mean25.217822
Median Absolute Deviation (MAD)4
Skewness3.469433
Sum2547
Variance128.79208
MonotonicityNot monotonic
2025-11-24T23:49:48.325700image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
239
 
8.9%
209
 
8.9%
199
 
8.9%
258
 
7.9%
176
 
5.9%
215
 
5.0%
155
 
5.0%
165
 
5.0%
265
 
5.0%
274
 
4.0%
Other values (19)36
35.6%
ValueCountFrequency (%)
142
 
2.0%
155
5.0%
165
5.0%
176
5.9%
184
4.0%
199
8.9%
209
8.9%
215
5.0%
223
 
3.0%
239
8.9%
ValueCountFrequency (%)
1001
1.0%
541
1.0%
531
1.0%
491
1.0%
481
1.0%
471
1.0%
411
1.0%
401
1.0%
392
2.0%
382
2.0%

cd4_correction_applied
Categorical

Constant 

Quality flag: CD4 missing codes removed

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0.0
101 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters303
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0101
100.0%

Length

2025-11-24T23:49:48.365225image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:48.396792image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0101
100.0%

Most occurring characters

ValueCountFrequency (%)
0202
66.7%
.101
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number202
66.7%
Other Punctuation101
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0202
100.0%
Other Punctuation
ValueCountFrequency (%)
.101
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common303
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0202
66.7%
.101
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII303
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0202
66.7%
.101
33.3%

final_comprehensive_fix_applied
Categorical

Constant 

Quality flag: Comprehensive corrections applied

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
1.0
101 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters303
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0101
100.0%

Length

2025-11-24T23:49:48.430628image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:48.462385image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0101
100.0%

Most occurring characters

ValueCountFrequency (%)
1101
33.3%
.101
33.3%
0101
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number202
66.7%
Other Punctuation101
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1101
50.0%
0101
50.0%
Other Punctuation
ValueCountFrequency (%)
.101
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common303
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1101
33.3%
.101
33.3%
0101
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII303
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1101
33.3%
.101
33.3%
0101
33.3%

waist_circ_unit_correction_applied
Boolean

Constant 

Quality flag: Waist circ unit corrected

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size909.0 B
False
101 
ValueCountFrequency (%)
False101
100.0%
2025-11-24T23:49:48.488951image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Interactions

2025-11-24T23:49:46.174720image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:42.326417image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.415946image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.779934image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.150668image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.608050image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.967337image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.349656image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.713161image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:46.310825image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:42.515748image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.553614image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.919812image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.289003image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.743565image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.109166image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.489808image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.852359image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:46.336436image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:42.637201image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.581127image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.947074image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.398688image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.769441image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.138283image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.516702image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.879860image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:46.363041image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:42.776683image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.608434image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.974523image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.428351image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.797036image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.167285image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.544779image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.911128image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:46.392772image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:42.873469image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.638271image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.003542image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.459350image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.826868image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.199057image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.574036image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.942497image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:46.418189image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:42.963330image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.664913image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.032193image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.488093image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.853160image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.227323image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.599406image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.971846image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:46.447993image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.141543image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.694130image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.062739image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.519500image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.883434image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.256728image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.629899image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:46.002718image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:46.473741image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.230351image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.722864image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.091403image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.546502image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.909758image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.286618image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.655920image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:46.031959image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:46.503129image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.325776image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:43.753933image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.123079image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.580060image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:44.941576image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.319164image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:45.686917image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:46.146473image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2025-11-24T23:49:48.512357image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ALT (U/L)AST (U/L)Age (at enrolment)Hematocrit (%)Lymphocyte count (×10⁹/L)Neutrophil count (×10⁹/L)Patient IDPlatelet count (×10³/µL)hemoglobin_g_dL
ALT (U/L)1.0000.6920.1650.0380.0670.244-0.1430.0750.059
AST (U/L)0.6921.0000.149-0.008-0.1090.225-0.1860.076-0.021
Age (at enrolment)0.1650.1491.000-0.1310.0220.076-0.186-0.153-0.140
Hematocrit (%)0.038-0.008-0.1311.0000.0400.154-0.0430.1070.913
Lymphocyte count (×10⁹/L)0.067-0.1090.0220.0401.0000.0650.0880.244-0.019
Neutrophil count (×10⁹/L)0.2440.2250.0760.1540.0651.000-0.0740.1830.102
Patient ID-0.143-0.186-0.186-0.0430.088-0.0741.000-0.051-0.011
Platelet count (×10³/µL)0.0750.076-0.1530.1070.2440.183-0.0511.0000.020
hemoglobin_g_dL0.059-0.021-0.1400.913-0.0190.102-0.0110.0201.000

Missing values

2025-11-24T23:49:46.549607image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2025-11-24T23:49:46.651058image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

anonymous_patient_idPatient IDstudy_sourceprimary_datevisit_dateAge (at enrolment)Sexlatitudelongitudeprovincecityjhb_subregionHematocrit (%)Platelet count (×10³/µL)hemoglobin_g_dLLymphocyte count (×10⁹/L)Neutrophil count (×10⁹/L)ALT (U/L)AST (U/L)cd4_correction_appliedfinal_comprehensive_fix_appliedwaist_circ_unit_correction_applied
6128SCHARP_004_66JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0028.0Male-26.204128.03GautengJohannesburgSoweto46.9322.015.71.903.3146.025.00.01.0False
6129SCHARP_004_1010JHB_SCHARP_0042016-01-01 00:00:002016-01-01 00:00:0026.0Male-26.204128.03GautengJohannesburgSoweto43.6195.014.91.055.3812.023.00.01.0False
6130SCHARP_004_2424JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0027.0Male-26.204128.03GautengJohannesburgSoweto50.8269.016.92.181.8461.054.00.01.0False
6131SCHARP_004_2525JHB_SCHARP_0042016-01-01 00:00:002016-01-01 00:00:0019.0Male-26.204128.03GautengJohannesburgSoweto45.9275.014.93.672.3315.023.00.01.0False
6132SCHARP_004_3434JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0031.0Male-26.204128.03GautengJohannesburgSoweto49.3290.016.52.042.6115.025.00.01.0False
6133SCHARP_004_3737JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0023.0Male-26.204128.03GautengJohannesburgSoweto46.0257.015.82.741.9923.018.00.01.0False
6134SCHARP_004_3939JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0022.0Male-26.204128.03GautengJohannesburgSoweto48.8224.017.02.772.3822.024.00.01.0False
6135SCHARP_004_5757JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0022.0Male-26.204128.03GautengJohannesburgSoweto45.9246.015.21.783.9225.025.00.01.0False
6136SCHARP_004_6969JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0024.0Male-26.204128.03GautengJohannesburgSoweto44.5289.014.82.346.2232.047.00.01.0False
6137SCHARP_004_7171JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0026.0Male-26.204128.03GautengJohannesburgSoweto48.0241.015.81.621.6821.026.00.01.0False
anonymous_patient_idPatient IDstudy_sourceprimary_datevisit_dateAge (at enrolment)Sexlatitudelongitudeprovincecityjhb_subregionHematocrit (%)Platelet count (×10³/µL)hemoglobin_g_dLLymphocyte count (×10⁹/L)Neutrophil count (×10⁹/L)ALT (U/L)AST (U/L)cd4_correction_appliedfinal_comprehensive_fix_appliedwaist_circ_unit_correction_applied
6219SCHARP_004_17501750JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0024.0Male-26.204128.03GautengJohannesburgSoweto43.8197.014.22.351.6613.023.00.01.0False
6220SCHARP_004_17511751JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0021.0Male-26.204128.03GautengJohannesburgSoweto46.9346.016.52.384.7428.026.00.01.0False
6221SCHARP_004_18501850JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0021.0Male-26.204128.03GautengJohannesburgSoweto46.3305.015.92.091.7713.021.00.01.0False
6222SCHARP_004_18731873JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0020.0Male-26.204128.03GautengJohannesburgSoweto47.4278.015.62.642.8412.020.00.01.0False
6223SCHARP_004_18841884JHB_SCHARP_0042016-01-01 00:00:002016-01-01 00:00:0018.0Male-26.204128.03GautengJohannesburgSoweto53.7310.018.31.611.7222.025.00.01.0False
6224SCHARP_004_19171917JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0020.0Male-26.204128.03GautengJohannesburgSoweto50.9270.017.11.988.3518.022.00.01.0False
6225SCHARP_004_19231923JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0021.0Male-26.204128.03GautengJohannesburgSoweto43.2245.013.82.812.6028.028.00.01.0False
6226SCHARP_004_19411941JHB_SCHARP_0042016-01-01 00:00:002016-01-01 00:00:0018.0Male-26.204128.03GautengJohannesburgSoweto48.9316.016.91.563.7952.032.00.01.0False
6227SCHARP_004_19521952JHB_SCHARP_0042015-01-01 00:00:002015-01-01 00:00:0026.0Male-26.204128.03GautengJohannesburgSoweto40.4315.013.22.432.0717.029.00.01.0False
6228SCHARP_004_19921992JHB_SCHARP_0042016-01-01 00:00:002016-01-01 00:00:0026.0Male-26.204128.03GautengJohannesburgSoweto50.4238.016.72.612.0510.019.00.01.0False